The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. In 1992, this value was revised to 36.8$^{\circ}$C or 98.2$^{\circ}$F.
In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.
Answer the following questions in this notebook below and submit to your Github account.
You can include written notes in notebook cells using Markdown:
In [1]:
%matplotlib inline
import pandas as pd
import seaborn as sns
import scipy.stats as stats
import statsmodels
import numpy as np
sns.set_style('white')
In [2]:
df = pd.read_csv('data/human_body_temperature.csv')
In [3]:
df.head()
Out[3]:
In [4]:
print(stats.normaltest(df.temperature))
print(len(df.temperature)) # nr of samples
sns.distplot(df.temperature)
# yes it's normally distributed (pvalue > 0.05)
# Also distribution looks bellshaped (not skewed)
# Sample size bigger than 30
Out[4]:
In [5]:
# T-test
print(df.temperature.mean()) # measured mean
stats.ttest_1samp(df.temperature, 98.6)
# not the true population mean, pvalue<0.05
Out[5]:
In [6]:
# Z-test (can't find a package)
print((df.temperature.mean() - 98.6) / (df.temperature.std() / np.sqrt(len(df.temperature))))
# Table does not go to 5, hence biggest probability for Z 3.09 is 0.999
print(1 - 0.999) # P-value actually even smaller than that number, so p-value<0.05 aka H0 can be rejected.
# Not the true population mean
# In practice doesn't matter whether you use T-test or Z-test, both reject H0. Officially you should use the Z-test.
In [7]:
# 95% confidence interval
print(np.percentile(df.temperature, [2.5, 97.5]))
# Below the first number and above the second number
In [8]:
# margin of error
np.percentile(df.temperature, [2.5, 97.5])[1] - df.temperature.mean()
# margin of error is 1.23 Fahrenheit.
Out[8]:
In [9]:
from scipy.stats import ttest_ind
print(len(df[df.gender == 'M'].temperature))
ttest_ind(df[df.gender == 'M'].temperature, df[df.gender == 'F'].temperature)
# Can't find a package, so using T-test instead, similar results with these sample sizes.
# P-value < 0.05, hence significant difference between males and femaled normal temperature.
Out[9]:
In [ ]: